“Football is two things. It’s blocking and tackling.”
Vince Lombardi
Although this citation from one of the most successful head coaches in the history of the NFL dates back several decades — and the sport of American Football has of course changed since then — tackling remains an integral aspect of the game. In contrast to decision-making scenarios faced by players, such as a quarterback’s selection of a target for a pass, the decision for a tackle is more straightforward: a defense player should always promptly tackle the ball carrier.
When assessing players’ tackles, one is usually interested in a hypothetical scenario: the potential outcome if a player were to miss a tackle. Essentially, this involves quantifying the yards saved by a defensive player. Ideally, albeit impractically, running a play twice — once with the defense player executing the tackle and a second time without — would allow a direct comparison of the yardage gained by the ball carrier, thus enabling to evaluate the impact of the defensive player’s tackle.
Given the impracticability of such a hypothetical scenario, our approach involves approximating it by predicting the yard line of the ongoing play twice. First, we consider the inclusion of the closest defender who executed the tackle (left panel of the below figure), and in a next step, we exclude this player (right panel of the below figure). However, only quantifying the yards saved by a particular tackle does not suffice as an adequate measure of tackle value, due to lack of interpretability on a scale truly relevant to the game outcome. Therefore, we aim to produce a measure of tackle value on the scale of expected points (EP). EP can be viewed as a complicated mapping of the end-of-play yard line to the expected points in the next play. A sole point prediction of the mean yard line misses uncertainty propagation to the EP scale, such that we aim to produce a full conditional density estimate to calculate the expected points from. The metric derived from this methodology then quantifies the prevented expected points (PEP).
To accurately predict the yard line at the end of any given play it is necessary to create several features derived from the tracking data. More specifically, we conducted the following feature preprocessing:
For each play, we define the x-position of the ball carrier in the last frame as the end-of-play yard line. The response variable we aim to predict is yards to be gained, defined as the difference of the x-position of the ball carrier in a given frame to the end-of-play yard line.
For all players and the ball carrier we use the features already contained in the tracking data, namely x- and y-coordinates, speed, acceleration, distance covered, orientation and direction. Except for the ball carrier we further compute the euclidean distance, x-distance and y-distance to the ball carrier. For defensive players only, we additionally compute the absolute difference of the defender’s direction and the angle of the shortest segment between the defender and the ball carrier. Subsequently, we order all players (in each frame) with respect to their euclidean distance to the ball carrier and standardize all features.
More details regarding data preprocessing can be found in the Appendix.
Our analysis comprises four steps:
We train a model designed to predict the yards to be gained from which we can calculate the end-of-play yard line (see Yurko et al., 2020). The model uses the previously described features, only including the ten closest defenders and should account for potential non-linear and interaction effects. The time-series nature of the data suggests the usage of deep learning architectures such as transformers or LSTMs which, however, lack uncertainty quantification. Though, here, modeling the uncertainty is important as the variance of the end-of-play yard line differs substantially between varying game situations. Thus, we set up a conditional density estimator \(\hat{f}(y \mid x)\) and opt for a middle-ground solution between accuracy in mean prediction and uncertainty quantification and consider a random forest comprising 1000 individual trees.
Details on the training and testing procedure as well as model evaluation can be found in the Appendix.
For each tackle, we systematically remove the closest defender at the moment of the tackle and replace the features with those of the second closest defender. Further on, we replace the second closest with the third closest, and so on. In this way, we come up with a prediction for a hypothetical “what if the tackle would be missed” scenario which then can be compared to the real existing tackle.
Using the trained random forest, we predict the end-of-play yard line with 1000 trees. Using a kernel density estimator for visualization, we can plot the dynamically evolving conditional density estimate within any given play.
For the purpose of illustration, we present a specific example play. The video below shows a successful passing play from the Detroit Lions against the Miami Dolphins. After a completed pass, the ball carrier (in this case tight end T.J. Hockenson) is able to gain a substantial amount of yards by evading a tackle and is finally stopped at the 12-yard line.
Below we display an animation of that same play (in the transformed coordinate system, see Appendix). At each frame, we add the conditional density of the yards to be gained from our model. There are a few observations: First, at the beginning of the play the density is concentrated, because the model expects a tackle from the closest defender. As soon as T.J. Hockenson is able to evade the first tackle, the density changes. The distribution’s variance increases and we even observe bimodality with a lot of mass at the endzone. Finally, at the time of tackle the distribution narrows again, as we expect the runner to make only a few more yards.